ggplot2 is a package included in the tidyverse that’s great for data visualization.

Today we’ll learn ggplot2 basics:

library(gapminder)
library(tidyverse)



Basic plot: ggplot() + geom_point()

Take some data and build a scatterplot.


Run ?ggplot in your console to see the help docs for ggplot. There’s a lot of info there. We learn:

  • ggplot() initializes a ggplot object.
  • the argument data needs a "data.frame" object. We’re in luck, because gapminder is a "tbl" and a "data.frame".

Then we’ll add + geom_point() to draw a scatterplot:

  • We could have used lots of other geoms: there’s geom_line(), geom_boxplot(), geom_histogram(), etc. We’ll get to those later.
  • The first aesthetic mapping we’ll do is to map the variable lifeExp to the x-axis and gdpPercap to the y-axis of our plot.
  • An aesthetic mapping takes a variable in the data and maps it to an aesthetic in the plot.

Exercise

Exercise 1: Draw a scatterplot that plots year on the x-axis and lifeExp on the y-axis. Does it seem like countries have had higher life expectancies over time?





Add labels: + labs()

Next we’ll add a title and adjust the labels on the x- and y-axis.

Check out ?labs:

  • labs() arguments:
    • ... : a list of name-value pairs where name is an aesthetic. We use the fact that x and y are plot aesthetics, and we give them values "GDP/capita" and "life expectancy".
    • title: we set as "GDP per capita correlates with life expectancy"
    • subtitle
    • caption
    • tag

Good titles explain something about what your plot means. However, that oftentimes leads to long titles. Since my title was running off the page, I decided to adjust the global font size. I did that with the theme() call.

See the next section for more info on theme()!

Exercise

Exercise 2: Take the life expectancy over year boxplot from the answer to Exercise 1 and add a title, caption, and tag.





More theme()

What else can we do with theme()? Check out ?theme

  • There’s a ton of arguments.
  • The main ones are the first 3: line, rect, and text.
    • Other arguments inherit elements from these first arguments.
    • For example, we made all the text in our plot purple and point size 10 when we did this: text = element_text(size = 10, color = "purple")
    • If you only wanted to make the title size 10 and purple, you could instead do this: plot.title = element_text(size = 10, color = "purple").
    • But if plot.title is left unspecified, it will inherit elements from text.
  • The exception is that panel.background doesn’t inherit like it should from rect. It’s a bug.
  • As you can see, it’s easy to create some awful looking things. So for now we’ll use a preset theme. Type in your console theme_ to see the options ggplot2 has.





Presets: theme_*






Fit a line: + geom_smooth()

geom_smooth() does smoothed conditional means. Here, it adds another layer of graphics on top of the scatterplot.

  • Check out ?geom_smooth
  • Use geom_smooth(method = lm) to get a straight line (OLS)

The geom will inherit data and also aesthetic mappings from the ggplot call. So for cleaner looking code I can write this:





Scales: scale_x_log10()

The scatterplot is fan-shaped, which is a sign you might want to take the log of one (or both) of the axes. Here are 2 techniques that will lead to almost the same result.

Note the difference in the breaks on the x-axis. log10(1000) = 3, but log GDP/cap = 3 is harder to decipher than GDP/cap = 1,000.





Color to represent continent

Next I want to color the points by continent. That’s another aesthetic mapping. Just like gdpPercap is mapped to x and lifeExp is mapped to y, we can map continent to color.

Exercise

Exercise 3: Instead of mapping continent to color, map continent to shape. What’s the default shape scale?



Color to fixed value

Suppose instead of mapping continent to color, I wanted to color all the dots pink. That’s not an aesthetic mapping because you’re not taking information in the data and representing it with aesthetics in the plot. You’ll implement this by writing color = "pink" in the geom_point() call, but not wrapped with aes().





Adjust color scale: scale_color_manual()

Go back to mapping continent to color. Say I don’t like this default color scale. That’s another scale I can adjust.

continent is a factor variable with 5 levels, so I’ll need to pick out 5 colors.

## [1] "factor"
## [1] "Africa"   "Americas" "Asia"     "Europe"   "Oceania"

Go here to pick out colors by name, like "ivory3".

I prefer to just google “color picker” and use the widget thing there to get hex codes like “#553469”.

Exercise

Exercise 4: Instead of using aes(color = continent) and adjusting the color scale, use aes(color = continent, shape = continent) and adjust the shape scale along with the color scale. Try scale_shape_manual().





Adjust transparency: alpha

Whenever points overlap a lot like this, it’s a good idea to try adjusting the transparency of the points. We can do that by setting alpha. alpha must be a number between 0 and 1. The default is 1, and the closer it is to 0, the more transparent the points are.

Point size

Now I want to adjust the size of the points. Let’s make all the points larger then smaller. To affect all points, I’ll put size outside of the aes() call.



Map pop to size

I can also map population to size, so big countries get big points and small countries get small points. To do that, I’ll put size = pop in the aes() call!

Faceting: facet_wrap()

We’re nearly done for today! One of the last things we’ll talk about is faceting. Notice we have all the years of data mashed into one plot here? Suppose I wanted to draw a different plot for each year in the dataset. There’s a way to quickly do that, and it’s called faceting.

Exercise

Exercise 5: Use facet_wrap() to facet by continent instead of year. If you wanted to see growth in GDP/capita and life expectancy over time, how would you visualize it here?



Animation: gganimate::transition_states()

Finally, instead of breaking out into many plots, we overlay the plots and create an animation! I use gganimate::transition_time here, and I also decided to replace geom_point() with geom_text().

Review

We’ve covered a lot of ground! Here are the things we’ve learned:

Resources

Assignment 3: get to know more geoms

3.1 geom_line()

Use the gapminder package to draw a line plot showing how lifeExp has changed over time for a few different countries.

3.2 geom_bar() and geom_histogram()

Use geom_bar() to make a bar plot, then use geom_histogram() to make a histogram.

What’s the difference? Bar plots take categorical data like country and continent, while histograms take continuous data like gdpPercap and lifeExp.

For your bar plot, compare the number of observations in the data for each continent.

For your histogram, compare the frequency of observations with gdpPercap inside some intervals. Use only data from 2007.

3.3 geom_abline(), geom_vline(), and geom_hline()

You can use these three geoms to add straight lines to your plot. Take the histogram you drew in 3.2 and add a vertical line with geom_vline() at the international poverty line, currently set at $1.90 per day ($693.50 per year).